智能论文笔记

Can Direct Latent Model Learning Solve Linear Quadratic Gaussian Control?

Yi Tian , Kaiqing Zhang , Russ Tedrake , Suvrit Sra

分类：机器学习 | (统计)机器学习

2022-12-30

We study the task of learning state representations from potentially high-dimensional observations, with the goal of controlling an unknown partially observable system. We pursue a direct latent model learning approach, where a dynamic model in some latent state space is learned by predicting quantities directly related to planning (e.g., costs) without reconstructing the observations. In particular, we focus on an intuitive cost-driven state representation learning method for solving Linear Quadratic Gaussian (LQG) control, one of the most fundamental partially observable control problems. As our main results, we establish finite-sample guarantees of finding a near-optimal state representation function and a near-optimal controller using the directly learned latent model. To the best of our knowledge, despite various empirical successes, prior to this work it was unclear if such a cost-driven latent model learner enjoys finite-sample guarantees. Our work underscores the value of predicting multi-step costs, an idea that is key to our theory, and notably also an idea that is known to be empirically valuable for learning state representations.

translated by 谷歌翻译

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

Autoencoders as Cross-Modal Teachers: Can Pretrained 2D Image Transformers Help 3D Representation Learning?

Runpei Dong , Zekun Qi , Linfeng Zhang , Junbo Zhang , Jianjian Sun , Zheng Ge , Li Yi , Kaisheng Ma

分类：计算机视觉

2022-12-16

The success of deep learning heavily relies on large-scale data with comprehensive labels, which is more expensive and time-consuming to fetch in 3D compared to 2D images or natural languages. This promotes the potential of utilizing models pretrained with data more than 3D as teachers for cross-modal knowledge transferring. In this paper, we revisit masked modeling in a unified fashion of knowledge distillation, and we show that foundational Transformers pretrained with 2D images or natural languages can help self-supervised 3D representation learning through training Autoencoders as Cross-Modal Teachers (ACT). The pretrained Transformers are transferred as cross-modal 3D teachers using discrete variational autoencoding self-supervision, during which the Transformers are frozen with prompt tuning for better knowledge inheritance. The latent features encoded by the 3D teachers are used as the target of masked point modeling, wherein the dark knowledge is distilled to the 3D Transformer students as foundational geometry understanding. Our ACT pretrained 3D learner achieves state-of-the-art generalization capacity across various downstream benchmarks, e.g., 88.21% overall accuracy on ScanObjectNN. Codes will be released at https://github.com/RunpeiDong/ACT.

translated by 谷歌翻译

NVIDIA FLARE: Federated Learning from Simulation to Real-World

Holger R. Roth , Yan Cheng , Yuhong Wen , Isaac Yang , Ziyue Xu , Yuan-Ting Hsieh , Kristopher Kersten , Ahmed Harouni , Can Zhao , Kevin Lu

分类：机器学习 | 人工智能 | 计算机视觉

2022-10-24

Federated learning (FL) enables the building of robust and generalizable AI models by leveraging diverse datasets from multiple collaborators without centralizing the data. We created NVIDIA FLARE as an open-source software development kit (SDK) to make it easier for data scientists to use FL in their research and real-world applications. The SDK includes solutions for state-of-the-art FL algorithms and federated machine learning approaches, which facilitate building workflows for distributed learning across enterprises and enable platform developers to create a secure, privacy-preserving offering for multiparty collaboration utilizing homomorphic encryption or differential privacy. The SDK is a lightweight, flexible, and scalable Python package, and allows researchers to bring their data science workflows implemented in any training libraries (PyTorch, TensorFlow, XGBoost, or even NumPy) and apply them in real-world FL settings. This paper introduces the key design principles of FLARE and illustrates some use cases (e.g., COVID analysis) with customizable FL workflows that implement different privacy-preserving algorithms. Code is available at https://github.com/NVIDIA/NVFlare.

translated by 谷歌翻译

A general class of combinatorial filters that can be minimized efficiently

Yulin Zhang , Dylan A. Shell

分类：机器人

2022-09-10

组合过滤器的状态最小化是一个基本问题，例如在建造廉价，资源有效的机器人中。但是，已知确切的最小化是NP-HARD。本文比到目前为止对这种硬度进行了更细微的分析，并发现了有助于这种复杂性的两个因素。我们表明，每个因素都是问题硬度的独特来源，因此能够阐明（1）图表的（1）编码兼容性关系的结构的作用，以及（2）确定性 - 实现约束。正如先前的一系列工作试图引入其他假设并确定导致实际状态减少的子类一样，我们接下来使用这种新的，更加清晰的理解来探索特殊情况，以便为哪些确切的最小化是有效的。我们引入了一种用于约束修复的新算法，该算法适用于大型滤波器，其中包含了三种不同的特殊情况，以前已知多项式时间最小化最小化的可能性。尽管这三种情况中的每一个中的效率都以前出现在看似不同的特性中，而当通过本工作的镜头看时，它们的共同点现在变得很清楚。我们还提供有效降低的全新过滤器家族。

translated by 谷歌翻译

Facilitating Global Team Meetings Between Language-Based Subgroups: When and How Can Machine Translation Help?

Yongle Zhang , Dennis Asamoah Owusu , Marine Carpuat , Ge Gao

分类：自然语言处理

2022-09-07

全球团队通常由基于语言的亚组组成，这些子组将互补信息汇总在一起以实现共同的目标。先前的研究概述了这些团队的两步工作沟通流。有团队会议使用所需的通用语言（即英语）；为了准备这些会议，人们以母语为母语的对话。在团队会议上的工作沟通通常不如亚组对话效率。在当前的研究中，我们研究了利用机器翻译（MT）的想法，以促进全球团队会议。我们假设在团队会议之前交换子组对话日志会提供上下文信息，从而受益于团队合作。 MT可以翻译这些日志，这可以以低成本的方式理解。为了检验我们的假设，我们进行了一个受试者间实验，其中有20名参与者执行了人事选择任务。每个四重奏包括两名英语母语者（NS）和两个母语是普通话的非母语说话者（NNS）。所有参与者都以其母语的亚组对话开始了这项任务，然后以英语开始了团队会议。我们在团队会议之前操纵了子组对话日志的交换：MT介导的交流与没有。分析参与者的主观经验，任务绩效和讨论深度通过他们的对话举动所反映的，这表明当MT介导的亚组对话日志交流而不是没有交流时，团队会议质量会提高。最后，我们对何时以及如何应用MT进行了思考，以增强语言障碍的全球团队合作。

translated by 谷歌翻译

LED: Lexicon-Enlightened Dense Retriever for Large-Scale Retrieval

Kai Zhang , Chongyang Tao , Tao Shen , Can Xu , Xiubo Geng , Binxing Jiao , Daxin Jiang

分类：自然语言处理

2022-08-29

基于语义空间中密集表示的检索模型已成为第一阶段检索的必不可少的分支。这些检索员受益于代表学习朝着压缩全球序列级嵌入的进步。但是，它们很容易忽略本地的显着短语和实体在文本中提到的，这些短语通常在第一阶段的检索中扮演枢轴角色。为了减轻这种弱点，我们提议使一个密集的检索器对齐一个表现出色的词典意识代表模型。对齐方式是通过弱化的知识蒸馏来实现的，以通过两个方面来启发猎犬 - 1）词汇扬声的对比目标，以挑战密集编码器和2）一个配对的等级正规化，以使密集的模型的行为倾向于其他人的行为。我们在三个公共基准上评估了我们的模型，这表明，凭借可比的词典觉得回收犬作为老师，我们提议的密集人可以带来一致而重大的改进，甚至超过教师。此外，我们发现我们对密集猎犬的改进是与标准排名蒸馏的补充，这可以进一步提高最先进的性能。

translated by 谷歌翻译

HTML版本

Adam Can Converge Without Any Modification on Update Rules

Yushun Zhang , Congliang Chen , Naichen Shi , Ruoyu Sun , Zhi-Quan Luo

分类：机器学习

2022-08-20

自Reddi等人以来。 2018年指出了亚当的分歧问题，已经设计了许多新变体以获得融合。但是，香草·亚当（Vanilla Adam）仍然非常受欢迎，并且在实践中效果很好。为什么理论和实践之间存在差距？我们指出，理论和实践的设置之间存在不匹配：Reddi等。 2018年选择亚当的超参数后选择问题，即$（\ beta_1，\ beta_2）$;虽然实际应用通常首先解决问题，然后调整$（\ beta_1，\ beta_2）$。由于这一观察，我们猜想只有当我们改变选择问题和超参数的顺序时，理论上的经验收敛才能是合理的。在这项工作中，我们确认了这一猜想。我们证明，当$ \ beta_2 $很大时，$ \ beta_1 <\ sqrt {\ beta_2} <1 $，Adam收集到关键点附近。邻居的大小是随机梯度方差的命题。在额外的条件（强烈生长条件）下，亚当收敛到关键点。随着$ \ beta_2 $的增加，我们的收敛结果可以覆盖[0,1）$中的任何$ \ beta_1 \，包括$ \ beta_1 = 0.9 $，这是深度学习库中的默认设置。我们的结果表明，亚当可以在广泛的超参数下收敛，而无需对其更新规则进行任何修改。据我们所知，我们是第一个证明这一结果的人，而没有强有力的假设，例如有限梯度。当$ \ beta_2 $很小时，我们进一步指出了一个$（\ beta_1，\ beta_2）$的大区域，亚当可以在其中偏离无限。我们的差异结果考虑与我们的收敛结果相同的设置，表明在增加$ \ beta_2 $时从差异到收敛的相变。这些正面和负面的结果可以提供有关如何调整亚当超级参数的建议。

translated by 谷歌翻译

Confidence-Guided Learning Process for Continuous Classification of Time Series

Chenxi Sun , Moxian Song , Derun Can , Baofeng Zhang , Shenda Hong , Hongyan Li

分类：机器学习

2022-08-14

在现实世界中，时间序列的课程通常在最后一次标记，但是许多应用程序需要在每个时间点进行分类时间序列。例如关键患者的结果仅在最后确定，但应始终诊断出他以及时治疗。因此，我们提出了一个新概念：时间序列的连续分类（CCT）。它要求模型在不同的时间阶段学习数据。但是时间序列动态发展，导致不同的数据分布。当模型学习多分布时，它总是会忘记或过度贴身。我们建议，有意义的学习计划是由于一个有趣的观察而潜在的：通过信心来衡量，模型学习多个分布的过程类似于人类学习的过程多重知识。因此，我们提出了一种新型的CCT（C3T）的置信度引导方法。它可以模仿邓宁·克鲁格效应所描述的交替人类信心。我们定义了安排数据的客观信心，以及控制学习持续时间的自信。四个现实世界数据集的实验表明，C3T比CCT的所有基准更准确。

translated by 谷歌翻译

FDNeRF: Few-shot Dynamic Neural Radiance Fields for Face Reconstruction and Expression Editing

Jingbo Zhang , Xiaoyu Li , Ziyu Wan , Can Wang , Jing Liao

分类：计算机视觉

2022-08-11

我们提出了一些动态神经辐射场（FDNERF），这是第一种基于NERF的方法，能够根据少量动态图像重建和表达3D面的表达编辑。与需要密集图像作为输入的现有动态NERF不同，并且只能为单个身份建模，我们的方法可以使跨不同人的不同人进行面对重建。与设计用于建模静态场景的最先进的几杆NERF相比，提出的FDNERF接受视图的动态输入，并支持任意的面部表达编辑，即产生具有输入超出输入的新表达式的面孔。为了处理动态输入之间的不一致之处，我们引入了精心设计的条件特征翘曲（CFW）模块，以在2D特征空间中执行表达条件的翘曲，这也是身份自适应和3D约束。结果，不同表达式的特征被转换为目标的特征。然后，我们根据这些视图一致的特征构建一个辐射场，并使用体积渲染来合成建模面的新型视图。进行定量和定性评估的广泛实验表明，我们的方法在3D面重建和表达编辑任务上都优于现有的动态和几乎没有射击的NERF。我们的代码和模型将在接受后提供。

translated by 谷歌翻译